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SMOOTHNESS  PRIORS  AND  THE  DISTRIBUTED  I.AC  ESTIMATOR' 

by 

Hlrotugu  Aka  Ike  7 

1.  INTRODUCTION 

Shi ller  (4)  Introduced  the  concept  of  smoothness  prlora  to  develop  a 
Bavealan  estimator  of  the  lax  coefficients,  or  the  sequence  of  Impulse 
responses,  of  a  linear  system.  In  this  approach  the  prior  preference  of 
the  smoothness  of  the  pattern  of  a  set  of  laR  coefficients  Is  expressed 
by  a  spherical  normal  distribution  for  the  fixed  order  differences  of 
the  lax  coefficients.  With  an  appropriate  choice  of  the  parameter  of 
the  smoothness  prior  It  Is  found  that  Shlller's  distributed  lag  estimator 
can  produce  results  which  satisfy  our  prior,  or  psychological ,  expecta¬ 
tion  of  the  smoothness  of  the  pattern  of  the  lax  coefficients. 

One  practically  significant  problem  In  the  application  of  Shlller's 
estimator  Is  the  choice  of  the  variance  of  the  spherical  normal  prior 
distribution.  In  this  paper  we  propose  a  practical  solution  to  this 
problem  which  Is  obtained  by  maxlmlzinx  the  likelihood  of  the  Bavealan 
model  with  respect  to  the  hvperparameter,  the  variance.  A  serious  problem 
about  Shlller's  approach  Is  whether  the  assumption  of  smoothness  of  the 
lag  coefficients  Is  a  natural  one.  When  the  number  of  significant 

'The  author  Is  grateful  to  Professor  T.  V.  Anderson  for  the  stimulus 
leading  to  the  work  reported  in  this  paper.  Thanks  are  due  to  Ms.  E. 
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partly  supported  by  the  Office  of  Naval  Research  Contract  N00014-75-C-0442 
In  the  Department  of  Statistics,  Stanford  University. 
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coefficients  Is  expected  to  be  small,  which  la  often  the  case  with 


quarterly  or  annual  economic  data,  this  smoothness  assumption  mav  not 
be  quite  adequate. 

A  natural  expectation  for  this  tvpe  of  situation  seems  to  be  the 
smoothness  of  the  behavior  of  the  system  characteristic  In  the  frequency 
domain.  The  Integral  of  the  squared  absolute  value  of  the  derivative 
of  the  frequency  response  function  will  then  provide  a  measure  of  smooth¬ 
ness.  This  measure  takes  a  particularly  simple  form  In  terms  of  the  la" 
coefficients  and  leads  to  the  definition  of  a  non-spher leal  normal 
smoothness  prior. 


2.  SHOOTHNESS  IN  FIU'.QUKNCY  DOMAIN 
Consider  the  stochastic  linear  system 


(1) 


M 

y  -  I 

"  m-0 


n-m 


*  w 


where  v  .  x  ,  and  w  denote  the  output.  Input  and  error  term  of  the 
n  n  n 

system,  respectively.  Here  w^}  la  assumed  to  be  a  sequence  of  random 

variables  which  are  Independent  of  <x  )  and  are  Independently  Identl- 

n 

cally  distributed  as  normal  with  mean  nero  and  variance  0*  .  The  fre¬ 
quency  response  function  of  the  output  v  to  the  Input  x  is  defined 

n  n 

by 


M 

A(f)  »  j[  a  exp(-i2*Bf)  . 

—  A 


(2) 


By  a  proper  choice  of  the  constant  X,  this  minimization  leads  to  the 
determination  of  with  small  sum  of  souares  of  the  residuals  and 

also  with  asttoth  behavior  of  the  frequency  response  function. 

The  minimization  problem  Is  transformed  Into  a  statistical  problem 
bv  embedding  It  Into  the  maximization  problem  of 


Por  a  given  o  ,  the  first  factor  of  (4)  obviously  represents  the  like¬ 
lihood  of  the  ti^'a.  The  second  factor  represents  our  prior  preference 
on  a^'a  in  the  form  of  a  normal  distribution.  Our  preference  on 
and  a  is  not  specified  In  (4)  and  we  assume  the  improper  prior 


4q0  do 

a  o 


for  3^  and  a  .  This  choice  is  based  on  the  conalderat ion  that  it  is 
an  Ignorance  prior  of  the  parameters  of  a  normal  distribution  N(erig,  o) , 
where  c  Is  a  nonzero  constant.  For  the  justification  of  the  use  of 
this  Ignorance  prior  distribution  for  Its  Impartial  performance,  see 
Akalke  (1). 

Our  Baveslan  model  is  thus  defined  by  the  data  dlstrlbut Ion 

N 

(5)  p(y|a,o)  -( — exp( - [Jy  -  XaJJ2 

\2*  a  J  V  la 


and  the  (Improper)  prior  distribution 


(6) 


p(a,o| V) 


•  • 

where  y  -  (yj,  y2>  ....  y^)  ,  a  -  (aQ,  . a^)  and  X  and  D 

are  respectively  N*(M+1)  and  M»(M+1)  matrices  defined  by 


and  where  yj,  denotes  the  Euclidean  nortn  of  a  vector  y  and  |a| 
denotes  the  determinant  of  a  matrix  A  and  '  denotes  t  r  .ins  posit  Ion. 


4.  DETERMINATION  OF  THE  HYPF.RPARAMFTKR 
In  his  orlftlnal  work  Shlller  (4)  demonstrated  by  a  numerical  example 
the  insensitivity  of  his  estimate  to  the  choice  of  the  hvnerparameter  of 
his  smoothness  prior.  This  hvpc rpararae ter  corresponds  to  \  of  (6). 

We  often  consider  this  Insensitivity  as  a  proof  of  the  robustness  of  the 
Baveslan  model  and  take  It  as  a  Justification  for  an  arbitrary  choice  of 
the  value  of  the  hvperparamef er.  Nevertheless,  It  Is  only  when  we  know 
the  true  lag  coefficients  that  we  can  see  that  the  estimates  thus  ob¬ 
tained  do  approximate  the  true  coefficients.  Thus  It  Is  desirable  to 
have  an  objectively  defined  procedure  for  the  selection  of  the  value  of 
the  hvperparameter . 
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!f  the  prior  distribution  p(j,o|A)  Is  proper,  the  likelihood  of 
the  hvperparameter  A  with  respect  to  the  data  y  Is  defined  bv 

(7)  p(y|x>  -  |  p(y|a,o)  p(o,o|A)  andn  . 

A  reasonable  choice  of  X  Is  realised  by  maximizing  the  likelihood 
p(y  X)  .  Tills  procedure  for  the  deternlnat Ion  of  a  hvperparameter  Is 
called  the  method  of  Type  II  maximum  likelihood  by  Hood  (3).  In  our 
case  the  distribution  p(d,o!X)  Riven  by  (6)  contains  an  Improper  com¬ 
ponent.  Nevertheless,  for  each  particular  situation,  the  Improper  com¬ 
ponent  can  be  considered  as  an  approximation  to  a  very  widely  dispersed 
proper  prior  distribution.  Thus  we  take  (7)  as  the  definition  of  the 
likelihood  of  the  prior  distribution  defined  bv  (6)  and  propose  the  use 
of  X  which  maximizes  p(y|X)  .  Obviously  it  would  be  more  deal-able 
to  develop  an  Ignorance- type  prior  of  X  ,  but  at  present  we  leave  this 
as  a  subject  of  further  research. 

For  the  present  bayesian  model  defined  by  (5)  and  (6)  we  have 

N+mi  1 

p(y ! a,o)  p(o,o|X)  — — 2j  2  ! X2  dd'  ‘  (£) 


[-  (d-t.j’lX'X  +  x2  D'D)  (d-d 

)1  exp  f-  S(A)1 

L  2  a1  0 

"  «■  2o2  J 

where  iQ  -  (X’X  ♦  A2  D'D)"1  X’y  and  S(A)  -  JJyi;2  -  ^(X'X  +  aVd)^ 


Thus  we  have 


N  -N  _1  i 

p(y|*)  -  <|)<£)2  r<")  s(X)  2  | x*x  ♦  x2d*d|~2  |x2dd*|2  . 

Bv  applying  the  formal  Bayes  procedure  we  get  the  posterior  mean  of  a 
and  a2  .  They  are  given  by  and  S(X)/(N-2)  ,  respectively.  Com¬ 

putet  lonaly  ,  Up  and  S(X)  can  be  obtained  dlrectl*'  hv  using  the  re¬ 
lation 

<•>  S(X)  -  Min|Jy*  -  X*a||2  “  \\v*  -  X^csq'I2  , 

where 


The  search  for  the  X  which  maximizes  p (y | X )  m «v  practically  be  limited 
to  a  set  of  finite  discrete  values  of  X  .  Since  we  are  familiar  with 
the  log  likelihood  ratio  teat  statistic,  we  replace  p(y|X)  by 

»  log  p(v  >  and  try  to  minimize  it.  Ignoring  an  additive  constant, 
(-?.)  log  p(y|X)  is  given  by 

<9>  L(*)  -  N  log  S(X)  ♦  loglx’X  ♦  X2D’D|  -  log| X2DD' |  . 

Akaike  (2)  reached  the  same  procedure  for  the  determination  of  X  by 
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using  the  maximum  likelihood  estimate  of  Instead  of  developing  the 

Inproper  prior  distribution  of  a  . 

5.  NUMERICAL  EXAMPLES 

To  check  the  performance  of  our  procedure  It  was  applied  to  the  data 
generated  bv  the  relation 

y  ■  1.2  x  -  0.6  x  ,  ♦  0.4  x  ,  ♦  w  , 

7n  n  n-1  n-2  n 

where  the  Input  x^  was  the  same  as  that  used  in  the  second  example  of 

Shlller  [4)  and  w  was  normal  with  mean  xero  and  3  “  0.05.  The  length 

n 

N  of  the  time  series  was  40.  The  Raveslan  model  was  developed  with  the 
highest  lag  M  •  19.  The  search  for  the  mlnlsnim  of  1.(1)  was  limited 
to  the  values  \  •  10  *  2^  (k  -  -9,-8,...,-!)  and  the  minimum  was 
attained  with  k  -  -5  .  The  corresponding  estimate  Is  given  In 

Table  1  along  with  the  true  parameter  and  the  estimate  obtained  hv 
assuming  the  Shlller's  smoothness  prior  for  the  second  order  differences 
(aj  +  ,  -  ai+.j)  *  “  0,1,. ...M-2)  .  The  estimate  denoted 

by  Shi ller  was  obtained  by  putting 


in  (6)  and  then  minimixing  the  corresponding  L(l)  for  >  •  10  *  2 
(k  ■  -5,-4, . . . , 10) .  The  minimum  was  attained  at  k  •  10.  Further  in¬ 
crease  of  k  would  produce  estimates  smoother  than  that  given  In  Table  1. 


Table  1 


Comparison  of  estimates.  Qq  denotes  the  estimate  obtained  by  the 
smoothness  prior  In  the  frequency  domain.  Shi  Her  denotes  the  estimate 
obtained  by  assuming  Shlller's  prior  for  the  second  order  differences. 


m 

0 

1 

2 

3 

4 _ 

True 

1.2 

-0.6 

0.4 

0 

0 

°0 

1.137 

-0.440 

0.230 

0.050 

0.017 

Shlller 

0.203 

0.186 

0.170 

0.154 

0.137 

m 

5 

6 

7 

8 

9 

True 

0 

0 

0 

0 

0 

*9 

0.012 

-0.000 

0.003 

0.001 

-0.002 

Shlller 

0.121 

0.105 

0.089 

0.072 

0.056 

m 

10 

11 

12 

13 

14 

True 

0 

0 

0 

0 

0 

ao 

0.001 

0.003 

-0.000 

-0.000 

-0.003 

Shlller 

0.040 

0.023 

0.007 

-0.009 

-0.025 

■ 

IS 

16 

17 

18 

19 

True 

0 

0 

0 

0 

0 

°0 

-0.005 

-0.006 

-0.001 

0.004 

0.001 

Shlller 

-0.042 

-0.058 

-0.074 

-0.090 

-0.048 

The  result  given  In  Table  1  demonstrates  the  necessity  of  great 
care  In  applying  Haycalan  models.  Although  the  Shlller  estimate  Is 
definitely  In  accordance  with  the  prior  preference  of  smoothness  It  la 
giving  a  definitely  biased  Image  of  the  true  Impulse  response  senuence. 
The  sum  of  squared  errors  of  an  estimate  a  Is  defined  by  SSE •  [[(»-•[, 2 , 
ohere  a  denotes  the  true  parameter.  The  values  of  SSF.  are  0.06  and 
1.77  for  our  estimate  and  the  Shlller  estimate,  respectively.  Another 
meaaure  of  Inaccuracy  of  an  estimate  a  la  defined  bv  the  sum  of 
squared  errors  of  regression  defined  by 

SSER  -  ;jX(a-a)J]2  . 

The  values  of  SSER  are  0.0008  and  0.1955  for  our  estimate  and  the 
Shlller  estimate,  respectively.  Thus  the  bias  Introduced  bv  assuming 
Shlller's  smoothness  prior  seems  to  be  producing  significant  Influence 
on  the  estimation  of  regression.  The  result  shown  In  Table  1  Is  the 
one  obtained  In  a  series  of  three  experiments  all  of  which  pro¬ 
duced  similar  results. 

One  might  argue  that  the  situation  will  he  reversed  when  the 
present  prior  Is  applied  to  the  case  where  Shlller's  prior  Is  appro¬ 
priate.  To  check  this  possibility  the  estimates  based  on  the  two 
priors  were  obtained  for  a  sample  from  the  model  treated  In  Shlller's 
second  example.  The  result  Is  shown  In  Table  2.  For  the  purpose  of 
comparison  the  ordlnarv  least  squares  estimate,  denoted  by  LS,  Is  also 
Included.  The  result  demonstrates  the  extremely  good  performance  of 
the  Shlller  estimate.  This  result  Is  already  reported  In  Akalke  [21 
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and  !•  considered  aa  a  demon at  rat  ion  of  the  orrformance  of  our  procedure 
for  the  selection  of  A  .  The  preaent  result  also  shows  that  our  esti¬ 
mate  does  not  show  anv  significant  bias  and  that  It  Is  closer  to  the 
Shlller  estimate  than  to  the  least  squares  estimate.  The  sum  of  squared 
errors  of  the  estimate,  SSE,  Is  0.0117  for  our  estlnate,  while  It  Is 
0.0026  for  the  Shlller  estlnate.  Nevertheless,  the  sum  of  squared 
errors  of  regression,  SSFR ,  Is  0.00047  for  our  estlnate,  while  It  takes 
a  larger  value  0.00051  for  the  Shlller  estimate.  This  suggests  that 
the  use  of  the  present  prior  In  the  situation  where  Shlller's  prior  Is 
more  appropriate  nay  not  necessarily  Induce  slRnlflcant  loss  In  terms 
o'  the  accuracy  of  reRresslon. 


6.  DISCUSSION 

The  purpose  of  the  present  paper  has  been  two-fold.  The  first  Is 
to  Introduce  a  new  smoothness  prior  which  might  find  a  wider  applica¬ 
bility  than  the  Shlller's  original  smoothness  prior.  The  second  Is  to 
propose  a  practically  useful  objectively  defined  procedure  for  the 
choice  of  the  hvperparameter .  The  nunerlcal  results  riven  In  the  pre¬ 
ceding  section  show  that  these  objectives  are  attained  falrlv  well. 

One  of  the  most  significant  results  Is  the  numerical  demons t rat  ion 
of  the  possible  significant  bias  due  to  the  Kavestan  modeling.  In  the 
Raveslan  approach  we  specify  our  psychological  expectation  In  the  form 
of  a  prior  distribution  and  proceed  to  get  a  result  which  satisfies 
this  expectation.  Our  numerical  results  dramatically  demonstrated 
that  this  satisfaction  of  psychological  expectation  may  he  oulte 


deceptive  end  misleading.  Thus  we  see  that  no  unique  Bayesian  tsodel 
can  be  assured  of  a  success  In  Its  practical  application.  The  onlv 
reasonable  procedure  would  then  be  to  propose  several  possible  Bayesian 
models  and  compare  the  likelihoods  of  these  nodels,  and.  If  necessary 
and  feasible,  develop  a  larger  Bayesian  model  using  these  models. 

Certainly  the  smoothness  prior  In  the  frequency  domain  Introduced 
In  this  paper  Is  not  quite  satisfactory  to  cover  every  practical  si¬ 
tuation.  When  some  definite  delay  In  the  response  Is  conceivable,  l.e.. 


when  a^. 


are  expected  to  be  close  to  re ro  for  some  df^M), 


It  may  be  more  appropriate  to  assume  a  prior  distribution  over  the 


possible  delays  d  and,  for  each  d  ,  replace  the  definition  of  x 

n 

by  x  ,  ,  •  Practically  It  may  be  better  to  replace  A(f)  In  the 
n-o- 1 

definition  (3)  of  R  by  exp(l?»f6)  A(f)  defined  with  some  properlv 
chosen  Integer  6  .  This  replaces  R  of  (3)  by 


<2*)‘  l  (m-4)2  a*  . 

m-O 


vhtch  clearly  demonstrates  the  effect  of  the  choice  of  6  .  Since  we 
do  not  know  which  value  of  *>  we  should  use,  we  will  assume  a  prior 
distribution  over  a  possible  set  of  values  of  f  ,  such  as 
(-J,  0,  1,  2,  3)  ,  and  define  the  "likelihood"  of  each  model  bv 


expf- .  max 


L^O)] 


where  max  L^(>)  denotes  the  maximum  of  L ( \ )  defined  bv  (9)  for  a 
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specific  choice  of  A  .  With  .1  proper  modification  of  D  In  (6)  we 

may  even  assume  A  to  he  half  Integers,  such  as  -0.5,  0.5,  1.5,  ...  . 

The  practical  utility  of  this  tvpe  of  approach  needs  further  Invest lga- 
t*on. 

What  Is  practically  more  Important  than  the  refinement  of  the 
Haveslan  model  Is  the  recognition  of  the  limited  applicability  of  the 
baste  model  (1)  to  economic  time  series.  This  Is  due  to  the  fact  that 

there  la  usually  a  feedback  from  the  output  yn  to  the  input  . 

Only  In  the  special  situation  where  thla  feedback  la  negligible  can  we 
exnect  the  use  of  the  present  model.  Thus  the  result  reported  In  this 
paper  must  he  considered  as  onlv  the  second  step  following  the  first 
step  of  the  original  contribution  of  Shlller  In  making  the  Bnvesian 
modeling  a  practical  procedure.  Further  elaboration  of  the  basic  model 
la  definitely  necessary  to  make  the  procedure  widely  applicable  to  the 
analysta  of  complex  economic  time  aeries. 
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